A Large-Scale Multilingual Disambiguation of Glosses
نویسندگان
چکیده
Linking concepts and named entities to knowledge bases has become a crucial Natural Language Understanding task. In this respect, recent works have shown the key advantage of exploiting textual definitions in various Natural Language Processing applications. However, to date there are no reliable large-scale corpora of sense-annotated textual definitions available to the research community. In this paper we present a large-scale high-quality corpus of disambiguated glosses in multiple languages, comprising sense annotations of both concepts and named entities from a unified sense inventory. Our approach for the construction and disambiguation of the corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system; first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation, and then we combine it with a semantic similarity-based refinement. As a result we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we make it freely available at http://lcl.uniroma1.it/disambiguated-glosses. Experiments on Open Information Extraction and Sense Clustering show how two state-of-the-art approaches improve their performance by integrating our disambiguated corpus into their pipeline.
منابع مشابه
Extending, Trimming and Fusing WordNet for Technical Documents
This paper describes a tool for the automatic extension and trimming of a multilingual WordNet database for cross-lingual retrieval and multilingual ontology building in intranets and domain-specific document collections. Hierarchies, built from automatically extracted terms and combined with the WordNet relations, are trimmed with a disambiguation method based on the document salience of the w...
متن کاملSenseval-3 task: Word Sense Disambiguation of WordNet glosses
The SENSEVAL-3 task to perform word-sense disambiguation of WordNet glosses was designed to encourage development of technology to make use of standard lexical resources. The task was based on the availability of sensedisambiguated hand-tagged glosses created in the eXtended WordNet project. The hand-tagged glosses provided a “gold standard” for judging the performance of automated disambiguati...
متن کاملA gloss-centered algorithm for disambiguation
The task of word sense disambiguation is to assign a sense label to a word in a passage. We report our algorithms and experiments for the two tasks that we participated in viz. the task of WSD of WordNet glosses and the task of WSD of English lexical sample. For both the tasks, we explore a method of sense disambiguation through a process of “comparing” the current context for a word against a ...
متن کاملEuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text
Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale. In this paper we present EUROSENSE, a multilingual sense-annotated resource based on the joint disambiguation of the Europarl p...
متن کاملIntegrating Conceptual Density with WordNet Domains and CALD Glosses for Noun Sense Disambiguation
The lack of large, semantically annotated corpora is one of the main drawbacks of Word Sense Disambiguation systems. Unsupervised systems do not need such corpora and rely on the information of the WordNet ontology. In order to improve their performance, the use of other lexical resources need to be investigated. This paper describes the effort to integrate the Conceptual Density approach with ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1608.06718 شماره
صفحات -
تاریخ انتشار 2016